Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection
نویسندگان
چکیده
Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a frontend feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76 % in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93 %) and unknown attacks (5.87 %) compared to MFCC features.
منابع مشابه
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...
متن کاملA Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images
Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...
متن کاملUnsupervised Feature Learning for Speech and Music Detection in Radio Broadcasts
Detecting speech and music is an elementary step in extracting information from radio broadcasts. Existing solutions either rely on general-purpose audio features, or build on features specifically engineered for the task. Interpreting spectrograms as images, we can apply unsupervised feature learning methods from computer vision instead. In this work, we show that features learned by a mean-co...
متن کاملUnsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition
Recently, we have proposed an unsupervised filterbank learning model based on Convolutional RBM (ConvRBM). This model is able to learn auditory-like subband filters using speech signals as an input. In this paper, we propose two-layer Unsupervised Deep Auditory Model (UDAM) by stacking two ConvRBMs. The first layer ConvRBM learns filterbank from speech signals and hence, it represents early aud...
متن کاملUnsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification
In this paper, we propose to use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the...
متن کامل